Resiliency in IO Tools for Scientific Computing
نویسنده
چکیده
IO tools in numerical simulations often serve two functions, one for writing and reading files to restart calculations and the other for writing and processing diagnostic files including files for graphics post-processing. For diagnostic files, tools directly working for high-level data structures are desired, such meshes with adaptive mesh refinement, unstructured meshes, and association between meshes and variables. Typically, one file in numerical simulations involves several or many writings. One of questions about the resiliency of IO tools is the following. If a simulation was crashed during writing a file, could an IO tool read the high-level data structures, meshes and their associated variables in the file that was incompletely written? This paper describes how a parallel IO library, HIO, is developed and modified toward resiliency. The goal of the library is to provide sustainable, interoperable, efficient, scalable, and convenient tools for parallel IO and data management for high-level data structures in numerical simulations. The high-level data structures include multi-dimensional arrays, structured meshes, unstructured meshes, the meshes generated through adaptive mesh refinement, variables associated with these meshes, and data defined on particles in particle simulations. The library is based on MPI-IO. Compared with MPI-IO, the overhead to write the explicit users' data structures are very small. To further improve IO performance, in addition to meta data, the library could buffer problem-size data while keeping users' explicit highlevel data structures. The buffering mechanism improves IO performance by a factor 10 to 20 in simulations of multiphysics. For resiliency, we have designed and implemented additional procedure so that the library could read all the high-level data within a file even if the file was incompletely written due to an interruption of a simulation. The cost for the resiliency of the library is very small in either performance or space.
منابع مشابه
COMPUTING SCIENCE Resiliency Variance in Workflows with Choice
Computing a user-task assignment for a workflow coming with probabilistic user availability provides a measure of completion rate or resiliency. To a workflow designer this indicates a risk of failure, especially useful for workflows which cannot be changed due to rigid security constraints. Furthermore, resiliency can help outline a mitigation strategy which states actions that can be performe...
متن کاملEfficiency Evaluation of Cray XT Parallel IO Stack
PetaScale computing platforms need to be coupled with efficient IO subsystems that can deliver commensurate IO throughput to scientific applications. In order to gain insights into the deliverable IO efficiency on the Cray XT platform at ORNL, this paper presents an in-depth efficiency evaluation of its parallel IO software stack. Our evaluation covers the performance of a variety of parallel I...
متن کاملImpact of Policy Design on Workflow Resiliency Computation Time
Workflows are complex operational processes that include security constraints restricting which users can perform which tasks. An improper user-task assignment may prevent the completion of the workflow, and deciding such an assignment at runtime is known to be complex, especially when considering user unavailability (known as the resiliency problem). Therefore, design tools are required that a...
متن کاملComputing of the Burnt Forest Regions Area Using Digital Image Processing
At present, there is no conventional scientific method to evaluate the area of the burnt regions of forests and in this field, the related organizations use different methods and variables. Also, the speed in performing the processes of area computing and damage evaluation, especially in the extensive damaged forest regions is very slow; consequently, the expression of results takes more ti...
متن کاملComputing of the Burnt Forest Regions Area Using Digital Image Processing
At present, there is no conventional scientific method to evaluate the area of the burnt regions of forests and in this field, the related organizations use different methods and variables. Also, the speed in performing the processes of area computing and damage evaluation, especially in the extensive damaged forest regions is very slow; consequently, the expression of results takes more ti...
متن کامل